Skip to content

[fix](scan) Respect byte budget when merging scan blocks#63296

Merged
yiguolei merged 3 commits into
branch-4.1from
codex/fix-scanner-merge-byte-budget-4.1
May 22, 2026
Merged

[fix](scan) Respect byte budget when merging scan blocks#63296
yiguolei merged 3 commits into
branch-4.1from
codex/fix-scanner-merge-byte-budget-4.1

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 15, 2026

Summary

Fix scanner scheduler block merging so the adaptive batch size byte budget is respected when multiple scanned blocks are stitched into a cached block.

Root Cause

The scheduler merge path only checked the row count against batch_size(). When adaptive batch size produced multiple blocks that were individually acceptable, the scheduler could still merge them into a much larger block because it ignored preferred_block_size_bytes().

Changes

  • Capture preferred_block_size_bytes() for the scan task.
  • Merge into the last cached block only when both the row budget and byte budget are satisfied.
  • Keep empty-block merge behavior unchanged so eos/filtered-empty blocks are not emitted separately.
  • Preserve allocated_bytes() for memory accounting while using bytes() for the adaptive data-size budget.

Validation

  • git diff --check -- be/src/exec/scan/scanner_scheduler.cpp
  • ninja -C be/ut_build_ASAN src/exec/CMakeFiles/Exec.dir/scan/scanner_scheduler.cpp.o

Note: ./run-be-ut.sh --run --filter=ScannerContextTest.* was started earlier but stopped after it triggered a broad ASAN UT build; the changed object had already compiled successfully.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg mrhhsg changed the title [codex] Respect byte budget when merging scan blocks [fix](scan) Respect byte budget when merging scan blocks May 15, 2026
@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 15, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full PR diff and existing review context. No existing inline comments were present, and there was no additional user focus.

Critical checkpoints: correctness has a blocking issue because disabled adaptive batch sizing now still enforces a 512MB merge cap; compatibility/regression risk is therefore not acceptable until fixed. I did not identify concurrency, memory-accounting, security, or API-surface issues beyond this behavior regression. Tests/build were not run by me in this review environment; the PR description reports object compilation only.

Comment thread be/src/exec/scan/scanner_scheduler.cpp Outdated
@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 19, 2026

/review

2 similar comments
@zclllyybb
Copy link
Copy Markdown
Contributor

/review

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 20, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review summary for head 4916f6f:

Overall opinion: approve / no new blocking issues found in this review pass.

Critical checkpoints:

  • Correctness: The scanner merge path still enforces the row budget, adds the adaptive byte-budget check only when adaptive batch sizing is enabled, and preserves the empty-block merge behavior. The previously reported disabled-adaptive regression is not repeated here because the current head explicitly returns row-only behavior when config::enable_adaptive_batch_size is false.
  • Compatibility / behavior regression: I did not find an additional compatibility regression in the one-file diff. Disabled adaptive mode matches the old row-budget-only behavior; enabled adaptive mode now prevents merged cached blocks from exceeding preferred_block_size_bytes by data bytes.
  • Resource accounting / memory: The change continues to use allocated_bytes() for block usage accounting and bytes() for adaptive data-size budgeting, which is consistent with the PR intent. No additional accounting bug was identified.
  • Tests / validation: I did not run the full BE unit test suite in this review environment. I did run git diff --check for the PR diff, which passed. The PR description reports object-level compilation validation.
  • Existing review context: The existing inline thread in be/src/exec/scan/scanner_scheduler.cpp about adaptive-disabled behavior was treated as already-known and was not duplicated; the latest commit appears to address it.
  • User focus points: review_focus.txt says there are no additional user-provided focus points, so no focus-specific issues were found.

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 21, 2026

run buildall

@mrhhsg mrhhsg marked this pull request as ready for review May 22, 2026 01:55
@mrhhsg mrhhsg requested a review from yiguolei as a code owner May 22, 2026 01:55
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@yiguolei yiguolei merged commit 880e9e0 into branch-4.1 May 22, 2026
32 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants